Genetic variation as a long-distance modulator of RAD21 expression in humans

Somatic mutations and changes in expression of RAD21 are common in many types of cancer. Moreover, sub-optimal levels of RAD21 expression in early development can result in cohesinopathies. Altered RAD21 levels can result directly from mutations in the RAD21 gene. However, whether DNA variants outside of the RAD21 gene could control its expression and thereby contribute to cancer and developmental disease is unknown. In this study, we searched for genomic variants that modify RAD21expression to determine their potential to contribute to development or cancer by RAD21 dysregulation. We searched 42,953,834 genomic variants for a spatial-eQTL association with the transcription of RAD21. We identified 123 significant associations (FDR < 0.05), which are local (cis) or long-distance (trans) regulators of RAD21 expression. The 123 variants co-regulate a further seven genes (AARD, AKAP11, GRID1, KCNIP4, RCN1, TRIOBP, and USP32), enriched for having Sp2 transcription factor binding sites in their promoter regions. The Sp2 transcription factor and six of the seven genes had previously been associated with cancer onset, progression, and metastasis. Our results suggest that genome-wide variation in non-coding regions impacts on RAD21 transcript levels in addition to other genes, which then could impact on oncogenesis and the process of ubiquitination. This identification of distant co-regulation of oncogenes represents a strategy for discovery of novel genetic regions influencing cancer onset and a potential for diagnostics.

www.nature.com/scientificreports/ While GWAS reports susceptibility loci for a phenotype, traditional GWAS has several limitations, including an inability to discern a causal variant within the many linked SNPs in the risk locus. This may prevent true gene-trait associations from being identified. By integrating regulatory mechanisms such as spatial associations and eQTLs, which measure the ability of a variant to regulate the expression of target genes, the confluence of these data types can illuminate the underlying influence on the process of pathogenesis. Thus, eQTLs dissect genetic mechanism of variants implicated in multiple diseases and to prioritize SNPs or genes for further functional experiments. We hypothesised that by leveraging these regulatory associations, we could elucidate cohesin-associated pathologies which are affected by subtle, combinatorial changes in the regulation of RAD21. Here, we link the three-dimensional (3D) structure of the genome with eQTL data to identify common variants within the genome that affect the transcriptional levels of RAD21. Some of these eQTLs have previously been associated with disease pathways. However, their regulatory relationships with RAD21 open up novel avenues for exploration of disease onset for both cohesinopathy disorders and cancer.

Results
Variant selection for genome-wide search for distant regulators of transcription. To test whether variants across the genome had a significant effect on the transcription of RAD21 we performed a genome-wide search of all 42,953,834 SNPs in dbSNP151 (as available in GTEx v8 20 ) for a spatial-eQTL association with the RAD21 transcript level ( Fig. 1; Supplementary Table S1). We identified 123 SNPs that were significantly associated with RAD21 transcript levels from various genomic distances (120 cis, 1 trans-intrachromosomal, and 2 trans-interchromosomal; Fig. 2a; FDR < 0.05). Of note, the trans-intrachromosomal connection was with a locus that was > 16 Mb away from RAD21 within the linear sequence of chromosome 8 in eQTL tissue from the adrenal gland. The trans-interchromosomal connections involved SNPs that are located on chromosomes 11 and 13, in eQTL tissues from the hypothalamus (chr 11) and transformed lymphocytes (chr 13). Despite there being no similar standard for genome-wide eQTL studies, the red (p < 5 × 10 -8 ) and blue (p < 1 × 10 -5 ) dashed lines represent the typical p-value thresholds from GWAS studies as a reference.  Table S2). This resulted in the discovery of a hub of eight co-regulated genes: AARD, AKAP11, GRID1, KCNIP4, RAD21, RCN1, TRIOBP, and USP32 (Fig. 2b). The two SNPs (rs238256 and rs10639528) with trans-interchromosomal connections to RAD21 are responsible for the associations with AKAP11 and RCN1 transcript levels. Notably, SNPs regulating RAD21 in cis were associated with a trans-interchromosomal eQTL connection with GRID1. Of note, GRID1 is < 500 kb away from WAPL on chromosome 10q23.2 (WAPL protein facilitates cohesin's removal from chromatin).
Gene set enrichment analysis implicates the Sp2 transcription factor as a common regulator of these gene promoters. The promoter regions (± 1 kb from TSS) of RAD21 and the seven co-regulated genes we identified in this analysis were all significantly enriched for Sp2 regulatory motifs (Supplementary  Table S3; GGSNNGGG GGC GGG GCC NGNGS; Transfac putative transcription factor binding site M09658; p = 0.03047). This suggests that the level at which Sp2 acts is upstream of RAD21 and the co-regulated genes. Sp2 is a DNA binding transcription factor in the Sp subfamily, which are required for the expression of cell cycleand developmentally-regulated genes, and deregulated expression Sp family members is associated with human tumorigenesis 23 . Notably, it has recently been found that Sp2 is significantly upregulated in cohesin (STAG2) mutant CMK cells following 4 h of WNT3A treatment 24 . AKAP11 and USP32 are linked to regulation of cohesin via STRING protein-protein interaction networks. STRING analysis (Fig. 3) identified that two genes in this study have been shown to be co-expressed (KCNIP4 and GRID1; RNA co-expression score = 0.135) 25 . Additionally, AKAP11 and RAD21 are linked through a common co-expression with cohesin genes PDS5A (RNA co-expression score = 0.208) and PDS5B (RNA co-expression score = 0.203). Finally, RAD21 and USP32 are linked through a high-confidence Protein-Protein interaction between USP32 and SMC1A 26 (SMC1A is part of the cohesin complex with RAD21).
The cohesin complex and its co-regulated genes are enriched for loss-of-function intolerance. The gnomAD catalog categorizes the probability of loss-of-function (LoF) mutation intolerance (pLI) as a pLI ≥ 0.9 with 3,063 out of 19,704 genes (15.5%) having LoF-intolerance 4 . gnomAD also reports the observed/expected score 90% confidence interval for continuity across the spectrum of selection that can distinguish selection from low sample size bias. Using the 90% upper bound of the loss-of-function confidence interval (LOEUF), LoF mutation intolerance is defined as LOEUF < 0.35. For example, the LOEUF scores can differentiate the 678 genes essential for human cell viability (mean LOEUF of 0.63) compared to the 777 non-  Table S4). Of the seven co-regulated genes we identified in our analysis, three (42.9%) were LoF intolerant (Supplementary Table S4; AKAP11 pLI = 0.97878; GRID1 pLI = 0.99931; and USP32 pLI = 1.0). The other four were somewhat tolerant (AARD: pLI = 0.32580; KCNIP4 pLI = 0.82064) or completely LoF-tolerant (RCN1: pLI = 2.8821 × 10 -5 ; TRIOBP: pLI = 2.0161 × 10 -28 ). Including RAD21, that is 4 out of 8 genes in the analysis with LoF-intolerance, a significantly larger proportion than expected by chance based on pLI (p = 0.02, fisher exact test). Notably, the SP2 gene is also LoF-intolerant (pLI = 1.0). Using LOEUF, 4 of the 7 co-regulated genes are below the average essential gene LOEUF score (0.63) while none are above the average non-essential gene LOEUF (1.34) (Supplementary Table S4). SP2 is also below 0.63.

Discussion
An insufficiency of RAD21 that is not lethal can contribute to cohesinopathy or cancer. Here, we searched for genomic variation with subclinical moderation of RAD21 transcript levels to explore how modification of RAD21 enhancers across the genome could impact healthy human development and disease. Amongst the 42,953,834 genomic variants that mark the genome, we found significant (FDR < 0.05) spatial-eQTL associations with transcription of RAD21 in 123 variants. This analysis identified many local (cis) regulatory regions, but also three loci that were long-distance regulators of RAD21 expression. Overall, these 123 variants co-regulated a further seven genes, whose promoters were enriched for Sp2 transcription factor binding sites. Notably, these findings were found in specific cell lines (Hi-C) and across 25 eQTL tissues, which highlights the importance of using tissue-specific data, not just the most accessible tissue (blood). These results highlight the potential roles of transcription factor Sp2 and deubiquitination (via USP32) in RAD21 transcript level regulation.
The role of Sp2 as a common transcription factor. Sp2 primarily localizes at CCAAT motifs, and the CCAAT box binding transcription factor Nf-y is the major partner for Sp2-chromatin interactions 27 . The Sp2 transcription factor gene ontology (GO) annotations include DNA-binding transcription factor activity, RNA polymerase II-specific, and histone deacetylase binding. Sp2 is almost universally expressed across all human tissues, so modification of Sp2 levels might not require mechanisms with cell-type specificity. Although the www.nature.com/scientificreports/ role of Sp2 in normal human tissues has not been well examined, Sp2 knockouts in zebrafish are embryonically lethal 23 indicating that it is an essential gene, which is also supported in humans via the high pLI in gnomAD (pLI = 0.99721). We propose that the enrichment of these seven genes with Sp2 transcription factor binding sites results from one of two scenarios: 1. modification of RAD21 expression alters Sp2 regulation, the downstream effects of which modify the transcript levels of these co-regulated genes. 2. modification of Sp2 itself alters pathways upstream of the seven co-regulated genes, of which RAD21 is one.
As it has recently been shown that the knockdown of STAG2 in CMK cells was correlated with an upregulation of Sp2, this would argue for a direct correlation between cohesin levels and Sp2 24 .
The role of deubiquitination and USP32. In our analysis, three SNPs with cis-regulatory relationships with RAD21 also co-regulated USP32, a ubiquitin-specific protease. Protein ubiquitylation is a post-translational modification with an important role in regulating protein degradation. Ubiquitylation is a reversible process, removed by deubiquitylating enzymes, of which one class is the ubiquitin-specific proteases (USPs). Ubiquitination of the cohesin complex helps to regulate chromosome segregation and cohesion during mitotic progression through control of replication fork integrity and the cellular response to replication stress 28,29 . While previous findings have pointed to Ubiquitin-specific proteases USP13 and USP37, the mechanisms and effects of cohesin ubiquitination remain largely undefined 29 . Of note, in mitosis RAD21 is poly-ubiquitinated (targeting RAD21 for degradation), but the role of USPs in this hasn't been fully elucidated. Thus, USP32, a poorly characterized deubiquitinating enzyme, might also play a role. For example, both USP37 and USP32 can reduce resistance to cisplatin-targeting therapies in cancer 30 , suggesting the involvement of USP32 in drug resistance. Further evidence to this role of USP32 and cohesin has been found in a screen of deubiquitinating enzyme interactions, where USP32 had a high-confidence Protein-Protein interaction with SMC1A 26 (which has been found to be mono-ubiquitinated in mitosis 29 ). Additionally, USP32 mutations have been found in human leukemia cells, which frequently carry recurrent cohesin deficits, including mutations in SMC1 31 and RAD21 32 .
The role of USP32 in cohesin regulation could also be implicated through mutual calcium-dependence. While not significant in the gene set enrichment, it is notable that KCNIP4, RCN1, and USP32 are all involved in the same molecular function: calcium ion binding. The role of calcium with USP32 is undefined, but USP6 (97% sequence identity with USP32 as a chimeric fusion of USP32 and TBC1D3 33 ) induces tumorigenesis in a calcium-dependent manner through interaction with the ubiquitous calcium-binding protein calmodulin 34 . This calcium-ion dependence could be important in the co-regulation of RAD21 activity, as one mechanism for the dissolution of the cohesin complex from chromatids is through calpain-1, a RAD21 peptidase that cleaves RAD21 at L192 in a calcium-dependent manner 35 .

Role of RAD21-associated genes in cancer.
In total, six of the seven co-regulated genes and Sp2 have connections to cancer onset, progression, and metastasis. Overall, results suggest that genome-wide variation in enhancer regions could impact on cohesin function as well as other co-regulated genes which could impact on oncogenesis.
The enrichment for LoF-intolerance amongst the RAD21 co-regulated genes could be associated with an importance of these genes to normal development and cancer. The three LoF-intolerant genes (AKAP11, GRID1, and USP32) are all mutated in 5-8% of all Endometrial cancers in TCGA 36 . GRID1 is also at the breakpoint of several structural translocations such as t(10;20)(q23;q13) DPM1/GRID1 in Acute Myeloid Leukemia 37 . USP32 is overexpressed in breast cancer and human small cell lung cancer and may serve as an oncogene through promoting cell proliferation and tumor metastasis 38,39 . Many AKAP proteins, including AKAP11, are also commonly mutated in breast cancer 40 .
Beyond the three LoF-intolerant genes, three other RAD21 co-regulated genes also have putative roles in cancer. TRIOBP has been identified in a range of different cancers including lung carcinoma, glioblastoma, esophageal, pancreatic, prostate, lung, and breast cancer 41 . RCN1 inhibits IP3R1-mediated ER calcium release and thereby inhibiting ER stress-induced apoptosis, which is disrupted by mutation in breast, colorectal, kidney, and liver cancer 42 . The KCNIP4 gene is disrupted by a translocation (t(3;4)(p13;p15)) in renal cell cancer 43 and by a mutation in squamous cell lung cancer 44 .
In cancer, the expression pattern of Sp2 is inversely correlated with CEACAM1 expression in prostate cancer cells, likely due to its role in recruiting histone deacetylase to the CEACAM1 promoter (downregulation of a tumor suppressor gene) 45 . More recently, Sp2 disruption was shown to promote invasion and metastasis of hepatocellular carcinoma, possibly through TRIB3 46 .

Conclusion
By exploring 42,953,834 variants across the genome, we identified 123 that modify RAD21 transcript levels, and also co-regulate 7 genes (AARD, AKAP11, GRID1, KCNIP4, RCN1, TRIOBP, and USP32). These genes share key molecular pathways with RAD21, including being regulated by transcription factor Sp2 binding, having a role in deubiquitination, and oncogenesis. In addition, RAD21 and its co-regulated genes are enriched for being mutationally constrained. This work demonstrates that the association of RAD21 with these 7 co-regulated genes is likely to be functionally important, and has potential for disruption in pathologies such as cancer.  20 ) for an association with transcription of RAD21 (Fig. 4).
Identification of SNP-gene spatial relationships. For all variants, spatial regulatory connections were identified through association with transcript levels of RAD21 (expression Quantitative Trail Locus [eQTL]; GTEx v8 20 ) and a confirmed spatial interaction (Hi-C data) using the CoDeS3D algorithm (https:// github. com/ Genom e3d/ codes 3d-v1) 22,47 . Spatial-eQTL association p-values were adjusted using the Benjamini-Hochberg procedure, and associations with adjusted p-values < 0.05 were deemed spatial eQTL-eGene pairs. Variants with a minor allele frequency below 5% were filtered out due to sample size restrictions within GTEx.
To identify SNP locations in the Hi-C data, reference libraries of all possible Hi-C fragment locations were identified through digital digestion of the hg38 human reference genome with the same restriction enzyme employed in preparing the Hi-C libraries (i.e. MboI, HindIII). Digestion files contained all possible fragments, from which a SNP library was created, containing all genome fragments containing a SNP. Next, all SNP-containing fragments were queried against the Hi-C databases to find distal fragments of DNA which spatially connect to the SNP-fragment. If the distal fragment contained the coding region of RAD21 (GENCODE transcript model GrCH38 positions), a SNP-RAD21 spatial connection was confirmed. There was no binning or padding around restriction fragments to obtain gene overlap.
Defining mutationally-constrained genes. The human transcriptome consists of genes with varying levels of redundancy and critical function, resulting in some genes being intolerant to loss-of-function (LoFintolerant) mutation (e.g. RAD21). This loss-of-function intolerance is a phenomena described by the gnomAD team at the Broad Institute, whereby certain genes are rarely found to contain truncating mutations in healthy populations 4 . This subset of the human transcriptome are posited to also be more intolerant to regulatory perturbation. The gnomAD catalog lists 19,704 genes and their likelihood of being intolerant to loss-of-function mutations (pLI), resulting in 3,063 LoF-intolerant, 16,134 LoF-tolerant, and 507 undetermined (15.5% LoF-intolerance, defined as pLI ≥ 0.9) 4 . We tested all genes co-regulated with RAD21 (as defined above) for LoF-intolerance, comparing our cis and trans-acting eQTL gene lists for enrichment for LoF-intolerance. This analysis highlights the significance of long-distance gene regulation on otherwise mutationally-constrained (LoF-intolerant) genes, such as RAD21.
Gene ontology (GO), pathway analysis, and functional prediction. All genes were then annotated for significant biological and functional enrichment using g:Profiler 48 , which includes the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database (https:// www. kegg. jp/ kegg/ pathw ay. html) for pathways and TRANSFAC for transcription factor binding enrichment.

Protein-protein interaction (PPI) networks.
To test for previously known protein-protein interactions associated with our gene lists, all genes were used as inputs into STRING v11 with connection stringency set to low confidence (0.150) and up to 5 interactors added to the network 25 .